“Baby Mental Life: Study 2” was conducted on MTurk on 2018-08-04.
Our planned sample was 300 participants, and we anticipated that roughly 80% of recruited participants would pass all of our attention checks, so we initially recruited 378 participants (on the idea that ~80% of 378 ~ 300 participants; note that for administrative purposes we need to recuit participants in batches that were divisible by 9). After filtering out participants who failed at least one of our attention checks, we ended up retaining fewer than 300 participants, so we recruited an additional 16 participants for a total of 394 people recruited. At each stage, we recruited women and men through separate studies, in hopes of acquiring a roughly equal split between genders.
In the end, we ended up with a sample of 304 participants who passed our attention checks, 237 of whom came from unique GPS coordinates.
For this first pass, these data exclude participants where there is another participant with an identical set of GPS coordinates as recorded by Qualtrics.
Each participant assessed children’s mental capacities at 13 target ages between the ages of 0 and 5 years. For each target, they rated 20 mental capacities on a scale from 0 (not at all capable) to 100 (completely capable).
For more details about the study, see our preregistration here.
# load required libraries
library(tidyverse)
[30m── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 3.0.0 [32m✔[30m [34mpurrr [30m 0.2.5
[32m✔[30m [34mtibble [30m 1.4.2 [32m✔[30m [34mdplyr [30m 0.7.6
[32m✔[30m [34mtidyr [30m 0.8.1 [32m✔[30m [34mstringr[30m 1.3.1
[32m✔[30m [34mreadr [30m 1.1.1 [32m✔[30m [34mforcats[30m 0.3.0[39m
[30m── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(langcog) # source: https://github.com/langcog/langcog-package
Attaching package: ‘langcog’
The following object is masked from ‘package:base’:
scale
library(psych)
Attaching package: ‘psych’
The following objects are masked from ‘package:ggplot2’:
%+%, alpha
library(lme4)
Loading required package: Matrix
Attaching package: ‘Matrix’
The following object is masked from ‘package:tidyr’:
expand
# set theme for ggplots
theme_set(theme_bw())
chosen_rot <- "oblimin"
# run source code (extra home-made functions)
source("./scripts/max_factors_efa.R")
source("./scripts/plot_fun.R")
source("./scripts/reten_fun.R")
source("./scripts/data_prep.R")
NAs introduced by coercionattributes are not identical across measure variables;
they will be droppedJoining, by = "question_qualtrics"
# load in EFA results from study 1
efa_S1 <- readRDS("../study 1/s1_efa.rds")
heatmap_fun(efa_S1) +
labs(title = paste0("STUDY 1 Parallel Analysis (rotation: ", chosen_rot, ")"),
subtitle = "'% var.' indicates the amount of shared variance explained (total = 100%)")
Joining, by = "capacity"
Joining, by = "factor"
To test H1, we planned to conduct an exploratory factor analysis (EFA) collapsing across all 13 target characters (and treating an individual participant’s responses to each character as if they were independent data points) - see the preregistration for more details.
As with Study 1, we planned to examine three factor retention protocols in order to determine how many factors to retain: Parallel analysis, minimizing BIC, and a set of preset criteria outlined in Weisman et al. (2017). Here we look at each solution in turn.
We predicted that we’d see a similar factor structure to that found in Study 1.
We planned to examine oblimin-rotated solutions (which allow factors to correlate), but you could examine other rotation options by selecting a different rotation type here.
chosen_rot <- "oblimin" # preregistered: factors allowed to correlate
# chosen_rot <- "varimax" # orthogonal: factors forced not to correlate
# chosen_rot <- "none" # no rotation
reten_all_PA <- fa.parallel(d_all, plot = F); reten_all_PA
Parallel analysis suggests that the number of factors = 4 and the number of components = 4
Call: fa.parallel(x = d_all, plot = F)
Parallel analysis suggests that the number of factors = 4 and the number of components = 4
Eigen Values of
reten_all_par <- reten_all_PA$nfact
efa_all_par <- fa(d_all, nfactors = reten_all_par, rotate = chosen_rot,
scores = "tenBerge", impute = "median")
Loading required namespace: GPArotation
heatmap_fun(efa_all_par) +
labs(title = paste0("Parallel Analysis (rotation: ", chosen_rot, ")"),
subtitle = "'% var.' indicates the amount of shared variance explained (total = 100%)")
Joining, by = "capacity"
Joining, by = "factor"
These factors look extremely similar to what we saw in Study 1 (see above). I (Kara) would say that H1 is strongly supported.
We could look at factor scores using the Study 2 EFA to see which capacities were attributed to which targets. This is not the primary way we planned to investigate this - this was listed as a “follow-up analysis” - but I’m putting it here so that it’s in close proximity to the EFA results for ease of interpretation.
scoresplot_fun(efa_all_par, target = "all (study 2)",
target_encoding = "numeric") +
scale_x_continuous(breaks = seq(0, 60, 12)) +
labs(title = "Parallel Analysis") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Ignoring unknown aesthetics: y
scoresplot_fun(efa_all_par, target = "all (study 2)",
target_encoding = "numeric") +
scale_x_continuous(breaks = seq(0, 60, 12), trans = "sqrt") +
labs(title = "Parallel Analysis",
x = "age after square-root transformation (months)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Ignoring unknown aesthetics: y
scoresplot_fun(efa_all_par, target = "all (study 2)",
target_encoding = "ordinal") +
labs(title = "Parallel Analysis",
x = "age (ordinal)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Ignoring unknown aesthetics: y
And here’s a close look at all of the raw data (color-coded according to the Study 2 EFA results):
itemsplot_fun(efa_all_par, target = "all (study 2)") +
labs(title = "Parallel Analysis")
Joining, by = "capacity"
|================================= | 63% ~1 s remaining
|================================== | 65% ~1 s remaining
|=================================== | 66% ~1 s remaining
|==================================== | 68% ~1 s remaining
|===================================== | 70% ~1 s remaining
|====================================== | 71% ~1 s remaining
|======================================= | 73% ~1 s remaining
|======================================== | 75% ~1 s remaining
|========================================= | 76% ~1 s remaining
|========================================== | 78% ~1 s remaining
|========================================== | 80% ~1 s remaining
|============================================ | 82% ~1 s remaining
|============================================= | 83% ~1 s remaining
|============================================= | 85% ~0 s remaining
|============================================== | 87% ~0 s remaining
|=============================================== | 88% ~0 s remaining
|================================================ | 90% ~0 s remaining
|================================================= | 92% ~0 s remaining
|================================================== | 93% ~0 s remaining
|=================================================== | 95% ~0 s remaining
|==================================================== | 97% ~0 s remaining
|===================================================== | 99% ~0 s remaining
Joining, by = c("capacity", "factor", "order")
d_all %>%
rownames_to_column("subid_target") %>%
mutate(subid = gsub("_.*$", "", subid_target),
target = gsub("^.*_", "", subid_target),
target_num = recode(target,
"target00mo" = 0,
"target0Xmo" = 4/30,
"target01mo" = 1,
"target02mo" = 2,
"target04mo" = 4,
"target06mo" = 6,
"target09mo" = 9,
"target12mo" = 12,
"target18mo" = 18,
"target24mo" = 24,
"target36mo" = 36,
"target48mo" = 48,
"target60mo" = 60),
target_ord = recode_factor(target,
"target00mo" = "newborns",
"target0Xmo" = "4-day-olds",
"target01mo" = "1-month-olds",
"target02mo" = "2-month-olds",
"target04mo" = "4-month-olds",
"target06mo" = "6-month-olds",
"target09mo" = "9-month-olds",
"target12mo" = "12-month-olds",
"target18mo" = "18-month-olds",
"target24mo" = "2-year-olds",
"target36mo" = "3-year-olds",
"target48mo" = "4-year-olds",
"target60mo" = "5-year-olds")) %>%
select(-c(subid_target, target)) %>%
gather(capacity, response, -c(subid, starts_with("target"))) %>%
full_join(efa_all_par$loadings[] %>%
data.frame() %>%
rownames_to_column("capacity") %>%
gather(factor, loading, -capacity) %>%
group_by(capacity) %>%
top_n(1, abs(loading)) %>%
ungroup() %>%
arrange(factor, desc(abs(loading))) %>%
mutate(order = 1:20) %>%
select(capacity, factor, order)) %>%
# ggplot(aes(x = target_ord, y = response, color = factor)) +
ggplot(aes(x = target_num, y = response, color = factor)) +
facet_wrap(~ reorder(capacity, order)) +
geom_line(aes(group = subid), alpha = 0.1) +
geom_smooth(aes(group = capacity),
method = "lm", formula = "y ~ poly(x, 3)",
color = "black") +
scale_color_brewer(palette = "Set2", guide = "none") +
# scale_x_discrete("target age (ordinal)") +
scale_x_continuous("target age (months)", breaks = seq(0, 60, 12)) +
# scale_x_continuous("age after square-root transformation (months)",
# breaks = seq(0, 60, 12), trans = "sqrt") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
guides(color = guide_legend(override.aes = list(alpha = 1)))
Joining, by = "capacity"
reten_all_vss <- VSS(d_all, plot = F); reten_all_vss
Very Simple Structure
Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm,
n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
VSS complexity 1 achieves a maximimum of 0.85 with 1 factors
VSS complexity 2 achieves a maximimum of 0.95 with 2 factors
The Velicer MAP achieves a minimum of 0.02 with 4 factors
BIC achieves a minimum of -205.41 with 8 factors
Sample Size adjusted BIC achieves a minimum of -21.12 with 8 factors
Statistics by number of factors
reten_all_bic <- data.frame(reten_all_vss$vss.stats %>%
rownames_to_column("nfactors") %>%
top_n(-1, BIC) %>%
select(nfactors))$nfactors %>% as.numeric()
efa_all_bic <- fa(d_all, nfactors = reten_all_bic, rotate = chosen_rot,
scores = "tenBerge", impute = "median")
convergence not obtained in GPFoblq. 1000 iterations used.
heatmap_fun(efa_all_bic) +
labs(title = paste0("Minimizing BIC (rotation: ", chosen_rot, ")"),
subtitle = "'% var.' indicates the amount of shared variance explained (total = 100%)")
Joining, by = "capacity"
Joining, by = "factor"
A more complex picture, but the first 4 factors look similar to what we get through parallel analysis. (I think something similar happened with Study 1, but we should go back and compare.)
scoresplot_fun(efa_all_bic, target = "all (study 2)") +
labs(title = "Minimizing BIC") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Ignoring unknown aesthetics: y
We’ll skip regression analyses and other plots for now.
reten_all_k <- reten_fun(d_all, rot_type = chosen_rot)
print(paste("Preset criteria suggest retaining", reten_all_k, "factors"))
[1] "Preset criteria suggest retaining 4 factors"
This gives the same solution as parallel analysis - 4 factors :)
I (Kara) made a big mistake in thinking through this: I thought we could project a 20-variable dataset into a 60-variable dataset using the predict.psych() function, but we can’t!
I’ve tried to hack together a way to do this, by replacing all missing values at random (either within the full range of the scale, or around the midpoint, or near 0) - but I haven’t gotten anything to work. As you can see below (replacing missing values at random between 0-5), you see slight increases across all factors, and the most dramatic increase for Factor 4 - but I think this is because that factor is least well-defined in the Study 1 EFA solution? I think we need to focus on analyzing factor scores from our Study 2 EFA. I’m sorry for this mistake!
extra_var <- rownames(efa_S1$loadings)[!rownames(efa_S1$loadings) %in% rownames(efa_all_par$loadings)]
temp <- d_all %>%
rownames_to_column("subid") %>%
mutate(being_afraid_of_somebody = round(runif(3081, 0, 5)),
being_angry_at_somebody = round(runif(3081, 0, 5)),
being_aware_of_things = round(runif(3081, 0, 5)),
being_comforted_by_physical_touch = round(runif(3081, 0, 5)),
calming_themselves_down = round(runif(3081, 0, 5)),
detecting_danger = round(runif(3081, 0, 5)),
feeling_annoyed = round(runif(3081, 0, 5)),
feeling_bored = round(runif(3081, 0, 5)),
feeling_calm = round(runif(3081, 0, 5)),
feeling_confused = round(runif(3081, 0, 5)),
feeling_embarrassed = round(runif(3081, 0, 5)),
feeling_guilty = round(runif(3081, 0, 5)),
feeling_hopeless = round(runif(3081, 0, 5)),
feeling_loved = round(runif(3081, 0, 5)),
feeling_neglected = round(runif(3081, 0, 5)),
feeling_pleasure = round(runif(3081, 0, 5)),
feeling_pride = round(runif(3081, 0, 5)),
feeling_sad = round(runif(3081, 0, 5)),
feeling_safe = round(runif(3081, 0, 5)),
feeling_scared = round(runif(3081, 0, 5)),
feeling_textures = round(runif(3081, 0, 5)),
feeling_thirsty = round(runif(3081, 0, 5)),
feeling_too_hot_or_too_cold = round(runif(3081, 0, 5)),
feeling_worried = round(runif(3081, 0, 5)),
focusing_on_a_goal = round(runif(3081, 0, 5)),
getting_angry = round(runif(3081, 0, 5)),
getting_hurt_feelings = round(runif(3081, 0, 5)),
getting_pleasure_from_music = round(runif(3081, 0, 5)),
having_goals = round(runif(3081, 0, 5)),
having_thoughts = round(runif(3081, 0, 5)),
having_wants_and_desires = round(runif(3081, 0, 5)),
imagining_things = round(runif(3081, 0, 5)),
listening_to_somebody = round(runif(3081, 0, 5)),
making_choices = round(runif(3081, 0, 5)),
recognizing_others_emotions = round(runif(3081, 0, 5)),
recognizing_somebody_else = round(runif(3081, 0, 5)),
remembering_things = round(runif(3081, 0, 5)),
seeing = round(runif(3081, 0, 5)),
thinking_before_they_act = round(runif(3081, 0, 5)),
understanding_what_somebody_else_is_thinking = round(runif(3081, 0, 5))) %>%
column_to_rownames("subid")
scores_project <- predict.psych(object = efa_S1, data = temp)
scores_project %>%
data.frame() %>%
rownames_to_column("subid_target") %>%
mutate(subid = gsub("_.*$", "", subid_target),
target = gsub("^.*_", "", subid_target),
target_num = recode(target,
"target00mo" = 0,
"target0Xmo" = round(4/30, 3),
"target01mo" = 1,
"target02mo" = 2,
"target04mo" = 4,
"target06mo" = 6,
"target09mo" = 9,
"target12mo" = 12,
"target18mo" = 18,
"target24mo" = 24,
"target36mo" = 36,
"target48mo" = 48,
"target60mo" = 60),
target_ord = recode_factor(target,
"target00mo" = "newborns",
"target0Xmo" = "4-day-olds",
"target01mo" = "1-month-olds",
"target02mo" = "2-month-olds",
"target04mo" = "4-month-olds",
"target06mo" = "6-month-olds",
"target09mo" = "9-month-olds",
"target12mo" = "12-month-olds",
"target18mo" = "18-month-olds",
"target24mo" = "2-year-olds",
"target36mo" = "3-year-olds",
"target48mo" = "4-year-olds",
"target60mo" = "5-year-olds")) %>%
select(-subid_target) %>%
gather(factor, score, -c(subid, starts_with("target"))) %>%
ggplot(aes(x = target_num, y = score, color = factor)) +
facet_grid(~ factor) +
geom_line(aes(group = subid), alpha = 0.1) +
scale_x_continuous(breaks = seq(0, 60, 12)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
legend.position = "none") +
labs(title = "Kara's attempt to project into the Study 1 space",
subtitle = "Replaced all missing values (40 per participant) with a random integer between 0-5",
x = "target age (months)", y = "factor score")
Here’s a multilevel linear regression on these factor scores, with random intercepts and slopes (for target and factor) by participant. Target is coded as numeric, with only the linear contrast.
efa_all_par_scores <- efa_all_par$scores[] %>%
data.frame() %>%
rownames_to_column("subid") %>%
mutate(target = gsub("^.*_target", "target", subid),
ResponseId = gsub("_target.*$", "", subid),
target_num = recode(target,
"target00mo" = 0,
"target0Xmo" = round(4/30, 3),
"target01mo" = 1,
"target02mo" = 2,
"target04mo" = 4,
"target06mo" = 6,
"target09mo" = 9,
"target12mo" = 12,
"target18mo" = 18,
"target24mo" = 24,
"target36mo" = 36,
"target48mo" = 48,
"target60mo" = 60),
target_ord = recode_factor(target,
"target00mo" = "newborns",
"target0Xmo" = "4-day-olds",
"target01mo" = "1-month-olds",
"target02mo" = "2-month-olds",
"target04mo" = "4-month-olds",
"target06mo" = "6-month-olds",
"target09mo" = "9-month-olds",
"target12mo" = "12-month-olds",
"target18mo" = "18-month-olds",
"target24mo" = "2-year-olds",
"target36mo" = "3-year-olds",
"target48mo" = "4-year-olds",
"target60mo" = "5-year-olds")) %>%
select(-subid, -target) %>%
gather(factor, score, -starts_with("target"), -ResponseId) %>%
mutate_at(vars(factor), funs(factor))
contrasts(efa_all_par_scores$factor) <- contr.sum(reten_all_par)
# r_all_par <- lmer(score ~ target_num * factor
# + (target_num + factor | ResponseId),
# efa_all_par_scores)
# summary(r_all_par, corr = F)
If we try to run the model above (our planned analysis), we get an error: “Model is nearly unidentifiable: very large eigenvalue.” The error suggests rescaling variables, which solves the problem. Here I’ve re-scaled by divided age in months by 12, to get age in years. Let’s make sure to talk about this.
r_all_par_rescaled <- lmer(score ~ target_num * factor
+ (target_num + factor | ResponseId),
efa_all_par_scores %>%
mutate(target_num = target_num/12))
summary(r_all_par_rescaled, corr = F)
Linear mixed model fit by REML ['lmerMod']
Formula: score ~ target_num * factor + (target_num + factor | ResponseId)
Data: efa_all_par_scores %>% mutate(target_num = target_num/12)
REML criterion at convergence: 22685.5
Scaled residuals:
Min 1Q Median 3Q Max
-9.4103 -0.4328 0.0506 0.5125 5.3546
Random effects:
Groups Name Variance Std.Dev. Corr
ResponseId (Intercept) 0.3465 0.5887
target_num 0.0164 0.1281 -0.73
factor1 0.1932 0.4396 0.49 -0.40
factor2 0.2302 0.4798 -0.66 0.56 -0.41
factor3 0.2578 0.5077 0.24 0.00 -0.21 -0.61
Residual 0.2908 0.5393
Number of obs: 12324, groups: ResponseId, 237
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.393496 0.038790 -10.144
target_num 0.278856 0.008870 31.437
factor1 0.084232 0.030704 2.743
factor2 -0.297423 0.033149 -8.972
factor3 0.265225 0.034858 7.609
target_num:factor1 -0.059692 0.005336 -11.187
target_num:factor2 0.210773 0.005336 39.501
target_num:factor3 -0.187955 0.005336 -35.224
As we predicted (H2), we see dramatic increases in mental capacity attributions across the age range (main effect of target_num).
And also as we predicted (H1), we see differences across factors in where newborns are perceived to start off: Relative to the grand mean, newborns are perceived to start off with more “negative emotions” (distress, frustration, etc.; main effect of factor1), less/fewer capacities in the domain of “cognition and control” (emotional control, self control, etc.; main effect of factor2), and relatively more “bodily sensations” (pain, fatigue, etc.; main effect of factor3). (We could recode this to look at factor4, or just eyeball it from the plot.) Also as predicted, we see that the perceived changes across age vary dramatically across factors: “negative emotions” are perceived to change relatively less over development, “cognition and control” are perceived to change much more over development, and “bodily sensations” are predicted to chagne relatively less.
This is all very much in line with our preregistered hypotheses :)
Now let’s see what the polynomial effects look like (again, looking at age in years instead of months). As we expected, including all of the polynomial effects as random slopes caused the model not to converge (I think we must be calculating df wrong), so I implemented our preregistered remedy and included only the linear effect as a random slope.
# adding polynomial effects
r_all_par_poly <- lmer(score ~ poly(target_num, 3) * factor
+ (poly(target_num, 1) + factor | ResponseId),
efa_all_par_scores %>%
mutate(target_num = target_num/12))
summary(r_all_par_poly, corr = F)
Linear mixed model fit by REML ['lmerMod']
Formula: score ~ poly(target_num, 3) * factor + (poly(target_num, 1) +
factor | ResponseId)
Data: efa_all_par_scores %>% mutate(target_num = target_num/12)
REML criterion at convergence: 20140.4
Scaled residuals:
Min 1Q Median 3Q Max
-10.3531 -0.4619 0.0027 0.5298 5.8874
Random effects:
Groups Name Variance Std.Dev. Corr
ResponseId (Intercept) 0.2260 0.4754
poly(target_num, 1) 516.2740 22.7217 -0.51
factor1 0.1966 0.4434 0.45 -0.39
factor2 0.2336 0.4833 -0.60 0.55 -0.41
factor3 0.2611 0.5110 0.29 0.00 -0.22 -0.61
Residual 0.2326 0.4823
Number of obs: 12324, groups: ResponseId, 237
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.909e-15 3.119e-02 0.000
poly(target_num, 3)1 4.881e+01 1.553e+00 31.437
poly(target_num, 3)2 -2.006e+01 4.823e-01 -41.584
poly(target_num, 3)3 9.029e+00 4.823e-01 18.721
factor1 4.307e-15 2.977e-02 0.000
factor2 -2.640e-15 3.228e-02 0.000
factor3 -3.013e-15 3.403e-02 0.000
poly(target_num, 3)1:factor1 -1.045e+01 8.353e-01 -12.509
poly(target_num, 3)2:factor1 3.285e+00 8.353e-01 3.932
poly(target_num, 3)3:factor1 -1.259e+00 8.353e-01 -1.507
poly(target_num, 3)1:factor2 3.690e+01 8.353e-01 44.167
poly(target_num, 3)2:factor2 2.350e+00 8.353e-01 2.814
poly(target_num, 3)3:factor2 -6.747e+00 8.353e-01 -8.077
poly(target_num, 3)1:factor3 -3.290e+01 8.353e-01 -39.386
poly(target_num, 3)2:factor3 1.113e+01 8.353e-01 13.327
poly(target_num, 3)3:factor3 -5.075e+00 8.353e-01 -6.075
Correlation matrix not shown by default, as p = 16 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Lots to sift through here, but in general we see that the effect of target age on mental capacity attributions definitely has linear, quadratic, and cubic components, all three of which seem to vary substantially across factors. Pretty much all of these differences are “significant” (if you consider |t| > 2 to be “significant”) - for interpretation, I would need to look closer at the plot. Let’s pull it up again here, with blue lines approximating the formula y ~ poly(x, 3):
scoresplot_fun(efa_all_par, target = "all (study 2)",
target_encoding = "numeric") +
scale_x_continuous("age (months)", breaks = seq(0, 60, 12)) +
geom_smooth(method = "lm", formula = "y ~ poly(x, 3)",
color = "blue", size = 2)
Ignoring unknown aesthetics: y
We can talk through these interpretations together - but I find the difference between Factor 2 (“cognition & control”) and Factor 4 (“positive emotions”) to be especially interesting!
ggplot(d_demo, aes(x = Duration/60)) +
geom_histogram(binwidth = 2) +
geom_vline(xintercept = median(d_demo$Duration/60), color = "blue", lty = 2) +
scale_x_continuous(breaks = seq(0, 10000, 4)) +
labs(title = "Duration of study (according to Qualtrics)",
subtitle = "Blue dotted line marks median",
x = "Duration (in minutes)",
y = "Number of participants")
ggplot(d_demo, aes(x = Age)) +
geom_histogram(binwidth = 2) +
geom_vline(xintercept = median(d_demo$Age), color = "blue", lty = 2) +
scale_x_continuous(breaks = seq(0, 10000, 4)) +
labs(title = "Particpiant age (self-reported)",
subtitle = "Blue dotted line marks median",
x = "Age (in years)",
y = "Number of participants")
ggplot(d_demo, aes(x = GenderSex)) +
geom_bar() +
labs(title = "Particpiant gender/sex (self-reported)",
x = "Gender/sex",
y = "Number of participants")
ggplot(d_demo, aes(x = gsub('(.{1,30})(\\s|$)', '\\1\n', RaceEthnicity_collapse))) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(title = "Particpiant race/ethnicity (self-reported)",
x = "Race/ethnicity",
y = "Number of participants")
ggplot(d_demo, aes(x = FirstLang)) +
geom_bar() +
labs(title = "Particpiant first language (self-reported)",
x = "Language",
y = "Number of participants")
ggplot(d_demo, aes(x = factor(Education,
levels = levels(d$Education),
labels = gsub('(.{1,30})(\\s|$)', '\\1\n',
levels(d$Education))))) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(title = "Particpiant educational attainment (self-reported)",
x = "Highest level of education completed",
y = "Number of participants")
ggplot(d_demo, aes(x = Income)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(title = "Particpiant household income (self-reported)",
x = "Annual household income",
y = "Number of participants")
ggplot(d_demo, aes(x = HouseholdSize)) +
geom_histogram(binwidth = 1) +
geom_vline(xintercept = median(d_demo$HouseholdSize), color = "blue", lty = 2) +
scale_x_continuous(breaks = seq(0, 10000, 1)) +
labs(title = "Particpiant household size (self-reported)",
subtitle = "Blue dotted line marks median",
x = "Number of people in household (adults and children)",
y = "Number of participants")
ggplot(d_demo, aes(x = MaritalStatus)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(title = "Particpiant marital status (self-reported)",
x = "Marital status",
y = "Number of participants")
ggplot(d_demo, aes(x = Parent)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(title = "Particpiant parent status (self-reported)",
subtitle = "'NA' indicates response of 'Prefer not to say'",
x = "Parent status",
y = "Number of participants")
ggplot(d_demo %>% filter(Parent == "Yes"), aes(x = ChildrenNumber)) +
geom_histogram(binwidth = 1) +
geom_vline(xintercept = median(d_demo[d_demo$Parent == "Yes",]$ChildrenNumber, na.rm = T),
color = "blue", lty = 2) +
scale_x_continuous(breaks = seq(0, 10000, 1)) +
labs(title = "Number of children among parents (self-reported)",
subtitle = "Blue dotted line marks median",
x = "Number of children (among parents)",
y = "Number of participants")
ggplot(d_demo %>% filter(Parent == "Yes"),
aes(x = factor(ChildrenOldestAge_collapse,
levels = levels(d_demo$ChildrenOldestAge_collapse),
labels = gsub('(.{1,30})(\\s|$)', '\\1\n',
levels(d_demo$ChildrenOldestAge_collapse))))) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(title = "Age of oldest child among parents (self-reported)",
x = "Age of child in years (among parents)",
y = "Number of participants")
ggplot(d_demo %>% filter(Parent == "Yes"),
aes(x = factor(ChildrenYoungestAge_collapse,
levels = levels(d_demo$ChildrenYoungestAge_collapse),
labels = gsub('(.{1,30})(\\s|$)', '\\1\n',
levels(d_demo$ChildrenYoungestAge_collapse))))) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(title = "Age of youngest child among parents (self-reported)",
x = "Age of child in years (among parents)",
y = "Number of participants")